A three-dimensional approach to parallel matrix multiplication
نویسندگان
چکیده
A three-dimensional (3D) matrix multiplication algorithm for massively parallel processing systems is presented. The P processors are configured as a "virtual" processing cube with dimensions pl, p2, and p3 proportional to the matrices' dimensions-M, N, and K. Each processor performs a single local matrix multiplication of size Mlp, x Nlp, x Wp,. Before the local computation can be carried out, each subcube must receive a single submatrix of A and B. After the single matrix multiplication has completed, U/p3 submatrices of this product must be sent to their respective destination processors and then summed together with the resulting matrix C. The 3D parallel matrix multiplication approach has a factor of P1" less communication than the 20 parallel algorithms. This algorithm has been implemented on IBM POWERparallelTM SP2" systems (up to 216 nodes) and has yielded close to the peak performance of the machine. The algorithm has been combined with Winograd's variant of Strassen's algorithm to achieve performance which exceeds the theoretical peak of the system. (we assume the MFLOPS rate of matrix multiplication to be 2 MNK.)
منابع مشابه
A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure
The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...
متن کاملParallel Matrix Multiplication: A Systematic Journey
We expose a systematic approach for developing distributed memory parallel matrix matrix multiplication algorithms. The journey starts with a description of how matrices are distributed to meshes of nodes (e.g., MPI processes), relates these distributions to scalable parallel implementation of matrix-vector multiplication and rank-1 update, continues on to reveal a family of matrix-matrix multi...
متن کاملMinimizing the Communication Time for Matrix Multiplication on Multiprocessors
We present one matrix multiplication algorithm for two{dimensional arrays of processing nodes, and one algorithm for three{dimensional nodal arrays. One{dimensional nodal arrays are treated as a degenerate case. The algorithms are designed to utilize fully the communications bandwidth in high degree networks in which the one{, two{, or three{dimensional arrays may be embedded. For binary n-cube...
متن کاملCommunication-Efficient Parallel Dense LU Using a3-Dimnsional Approach
We present new communication-efficient parallel dense linear solvers: An LU factorization algorithm and a triangular linear solver. The new algorithms perform asymptotically a factor of P 1/6 less communication than existing algorithms, where P is the number of processors . The new algorithms employ a 3-dimensional (3D) approach, which has been previously applied only to matrix multiplication. ...
متن کاملTwo-dimensional cache-oblivious sparse matrix-vector multiplication
In earlier work, we presented a one-dimensional cache-oblivious sparse matrix–vector (SpMV) multiplication scheme which has its roots in one-dimensional sparse matrix partitioning. Partitioning is often used in distributed-memory parallel computing for the SpMV multiplication, an important kernel in many applications. A logical extension is to move towards using a two-dimensional partitioning. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IBM Journal of Research and Development
دوره 39 شماره
صفحات -
تاریخ انتشار 1995